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Designer ' s Pi f f iculties 

Striiggling with the design and construction of the world's most power- 
ful co2iputer has never been easy. And in many ways the nature of the struggle 
has been constant through time. It has taken at least four or five years to get 
every major new machine going. Typically, financial crises arise, regardless of 
whether the undertaking is in a university or industrial setting. And the 
speedup over the fastest previous machine has never been much more than factor 
of ten, often much less. Still the cumulative results from the mid 19^0 's to 
1970 l^ave resulted in an impressive speedup factor of 10 . 

Just as impressive, but more bewildering is the growth in complexity 
of computer organization. Early machines contained a few thousand relays or 
vacuum tubes, but modern ones are approaching 10 transistors. One of the de- 
signer's main trade-off problems has always been between the number of parts 
he uses and the speed of each individual part. Since for a fixed cost he always 
wants as fast a machine as -possible, he can choose a simple organization with 
very fast parts or a more complex organization with slower parts. The fewer the 
parts the higher the reliability, but faster parts cost more than slow ones and 
producing them may be very difficult. The designers of the most powerful, 
machines have always pushed both reliability and cost to their limits. One 
reason for this is that from the early 1950' s on, there have usually been two or 
more groups in competition to build the next big machine. 

For the moment we can leave the definition of "most powerful machine" 
at the intuitive level of "fastest and biggest." But modern machines have several 
goals in addition to these traditional ones. From the standpoint of operating 
cost, maximum ""throughout" is desired. In other words, a computing center manager 
would like to collect fees for as much of his machine time as possible. This be- 
comes a difficult matter when complex operating systems and input/output equip- 
ment are used, since these may consume a good deal of overhead time. Another 
goal "Which is becoming more difficult to achieve is low "turnaround time" for users. 
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Wnen nany individuals are attempting to use a common central facility, the system 
response time may get very long. To a large extent these newer problems are 
related to the software provided for big machines. Thus the modem design of super 
machines must really be the design of a hard'ware- software system. 

Earlier we remarked thait certain machine design difficulties have not 
changed in tine. . The overall design of systems has in fact become more complex due 
to the introduction of software design questions on top of hardware or logical 
design. No large machines has been a one man show. Thus the designer-builder inter- 
face has very often been the source of much difficulty. These difficulties include 
personality clashes, technical disagreements, failures to communicate, etc. Inter- 
facing the designers and implementers of software is no easier than with hardware 
people and indeed seems to be very much harder. Furthermore, now the hardware and 
software designers must talk to each other. Currently, large machine projects may 
involve literally hundreds of profes'sional people. Usually, the more, the worse. 

Finally, in our jeremiad of big system design,, the bitterest pill of all 
for imaginative designers is the "design freeze." Having kept open all options as 
long as possible, the designers must make their final decisions and stop designing. 
The several year construction period which follows is similar to a gestat'i^'Jn period 
in that changes in the design are virtually impossible and if attempted may prove 
fatal. In reality, of course, there are always some mistakes in the design and as 
many of ohese as possible are removed. These changes often cause major expenditures 
of money and sometimes degrade the machines' performance. 

In this introduction we shall quickly sketch the history leading to modern 
digital computers. We do this for several reasons. First, in spite of their great 
number of parts, computers are quite simple in functional terms and it is interesting 
to learn when various ideas were first proposed or implemented. It is also revealing 
to note how few really big innovations have occurred. Finally, we cannot resist 
telling the story of Charles Eabbage. 



The World's First Computer Designer 

Although present machines are direct descendents of ideas of the mid- • 
1930' Babbage designed his Analytical Engine, the world's first general purpose 
digital coniputer, nearly I50 years ago?" He also built a prototype of the world's 
first special purpose digital computer, his Difference Engine, which he evident- 
ly first thought about in I8l2 — ten years after the invention of the 
steamboat' The ideas that he and a few colleagues had about compuiters and pro- 
gramming over some 30 years are overwhelming. They touched on a great many of 
the ideas used in modern computers. Nor were his thoughts limited to computers, 
as we shall see later. 

Not surprisingly, Babbage had to face many of the above mentioned 
difficulties that present day designers encounter. Several of these proved 
so overwhelming that he never finished anything but a prototype of the Difference 
Engine. His major problem seems to have been a too ambitious plan — a block 
over which every designer must stimble at least once. . This led to financial 
problems and difficulties with his chief engineer. 

Babbage himself wrote dovm few details about his machines and it was 
said that his lectures abo*:t machines were pretty much incomprehensible^ 
Fortunately, an Italian army officer named Menabrea, who sat through a series of 
lectures Babbage gave in Turin in iS^iO, . published a good accoxint of the Analy- 
tical Engine. This was later translated into English and, at Babbage *s suggestion, . 
armotated by his colleague Ada Augusta, Countess of Lovelace. On reading this 
paper as well as several by Babbage one is depressed by the relatively small 
progress made by thousands of modern computer scientists. Or, to be more correct, 
one is annoyed by how often the same problem is discovered, worked on, solved, 
and bi*eathlessly discussed in the current literature. 

Babbage had been motivated as early as I8l2 to consider a machine 
which could evaluate polynosdals by the method of differences. He was annoyed 
by the. fact that human computers of astronomical and other tables were usually 
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people of some intellectual acccinplishment but that such coinputations really 
required only mechanical skills. • He vas also bothered by the large numbers of 
errors occurring in published tables as well as errata in errata sheets. So 
between 1820 and 1822 he built a six decimal digit Difference Engine capable of 
evaluating any second degree polynomial. Initial conditions were placed on 
wheels by hand. Spurred by his success with this project he obtained Government 
funds for a 26 digit, sixth degree Difference Engine. This was a very much 
more complex machine. It was to have automatic rounding, provision for double 
precision arithmetic, various alarm (interrupt and cornpletion) bells, as well 
as a method for engraving copper plates for printing the computed results. The 
latter would preclude transcription errors. Concerned about inherent mechanical 
errors, Babbage arranged various roller and conical bearings that would jam if 
certain mechanical tolerances were exceeded. ' If completed, the Difference Engine 
would certainly have revolutionized the tabulation of mathematical functions. 
It must also be noted that Babbage was developing a complex design notation for 
communicating his ideas to his engineering and construction people. 

This project dragged on for 10 years until 1833 consuming 17, OPO pounds 
of English government money and perhaps as much of Babbage 's o\jn fortune. During 
this period Babbage engaged in a series of fund raising activities and became in- 
creasingly at odds with his chief engineer Clement. Evidently he proposed many 
design changes but the exact details of the collapse of the project do not seem 
to have been recorded. In any case, by the early l830*s he was only interested 
in obtaining funds for the construction of his newest idea, the Analytical Engine. 
'Before discussing its details, we shall set these events in historical perspective 
by notjlng the following. The chronometer of Harrison, which was the first one 
adequate for precise longitudinal transoceanic navigation, was produced in* the IToO's 
after a very long and trying experience. It took Harrison 3 years to produce a 
copy of his first successful model. Interchangable ]parts were not to come for scr^e 
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tijne. In fact V/hitworth, who later introduced standord screw threads among 
other things, lost his job with Clement when the Difference Engine project 
collapsed. Babbage worked at a time which ^was sparked with .great inventions — 
the steaTi locomotive in 18.25/ the electric generator in 1831> the reaper in 
183^> the electromagnetic relay in 1835; Daguerreotype in. 1839 and telegraphy ^ 
in iQhk. Of course, no thought of an electrical machine was possible then. 
But one is impressed by Babbage 's courage to attempt so complex a mechanical 
device given .the state of the art at the time. 

Babbage 's machines were all designed to be driven by a hand crank, 
but in one of his accounts of his first inspiration he quotes an early con- 
versation with John Herschel. They were checking some tables and Babbage said 
"I wish to God these calculations had been executed by steam, " to which Herschel 
replied "It is quite possible." Herschel, Babbage, and George Peacock had been 
friends as Cambridge undergraduates, where they formed the Analytical Society. 
Later Herschel became a faJnous astronomer and Peacock a leading algebraist at 
Cambridge. Babbage later had many discussions of his machines with these men 
and many of the leading scientists of the day. LaPlace, Bessel and Jaco'Bi (not 
to mention the Du]:e of Wellington) all had extensive discussions with him. 

It is fascinating to note that Boole and DeKorgan were both con- 
temporaries of Babbage, but no interaction between them has been noted concern- 
ing machine design. However, Ada Augusta Byron — the poet's daughter — studied 
mathematics under DeMorgan for many years, lies. DeMorgan notes than on an early 
occasion, she took Ada to visit Babbage and that Ada quickly understood what 
was going on. Some years later as Lady Lovelace, ■ she translated Menabre^'s 
paper on the Analytical Engine and collaborated with Babbage. 

The Anal/oical Engine that Babbage designed in the l820's and l830's 
was spectacular, even by the standards of the 1950* s« His design methods and 
his ideas for the machine's organisation and use demonstrate Babbage 's genius. 



The ijisaense complexity of what he hoped to build demonstrates his kinship with 
many of todays designers . By pushing funds and technology to the limit ^- and 
often too far past the limit — he faced a -•long series of frustrations. 

• The Analytical Engine -was to be a fifty decimal digit machine. Its 
"store" ir memory was to hold 1000 of these words (about 16-5; 000 bits) in ''decimal 
form. These words could be \nritten from or read to the "mill", or arith- . 

metic and logical unit^" via some mechanical linkages. The whole system was under 

.1 

the control of a process which was described on two sets of punched cards. One 
set, the "operation cards" contained the series of operations to be performed. 
The otter set, called "variable cards" indicated which store locations were 
to be operated on by the operation cards. Babbage was quite familiar with -the 
Jacquard loom which was ^controlled by a sequence of .punched cards. In fact, 
the punched card idea dated back to the early 1700's, although Jacquard* s famous 
loom was not developed until iQOk. 

VThile the Analytical Engine did not have a stored progra/ii, it was 
able to perform various kinds of condition tests, and "then branch on the out- 
come. In particular it could move its card sequence forward or backvrard 'a 
fixed distance.. Furthermore, there was an index register and index adder 
available for loop control; to quote Menabrea, "VThen the number n has been in- 
troduced into the machine, a card will order a certain registering apparatus 
to mark (n-l), and will at the same time execute the multiplication of b by 
b." Tnis is in a discussion of evaluting b^. Note that the indexing arith-. 
metic was apparently carried out in parallel with the multiplication. The 
index register was evidently not used to index through memory, however. 

The arithmetic unit was designed to perform fixed point, fifty digit 
calculations at the follo\^ing speeds i add or subtract in one second, multiply 
or divide in one minute. To achieve such speeds Babbage devised, after years 
of work, a parallel addition algorithm with anticipatory carry logic! He was 
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very proud of that accomplishment. As in the Diff erence ■ Engine^ Babbage provided 
for multiple precision operations, automatic mechanical fault prevention and 
detection^ and automatic rounding and overflow detection. 

• Babbage was bothered for' some time about the provision of standard 
function values e.g. log x, sin x, to the machine. Finally he concluded /that 
either the recomputation of such numbers, essentially via a subroutine, each 
time they were needed or their provision from external cards would work. He 
was willing to let the decision rest on operating experience. His table look- 
up procedure was arranged as follo^/s. The machine's operator would be provided 
with dravrers full of such cards punched with both x and f(x). When a bell 
rang the operator would read a dial and pick out the corresponding card. 
The machine v/ould check to see that the correct card had been supplied by testing, 
the argument and if an operator error had occurred a louder bell would ring. 
He was quite proud of this idea because the problem as 'well as his solution had 
evidently perplexed Bessel, Jacobi and others for some time. 

When reading Babbage, Menabrea, and Lovelace one is amazed and de- 
lighted to see how far the questions of mechanical computing were exploi4"d. 
It is tempting to read things into their statements from time to time. On 
SDme occasions they are exasperatingly brief and sometimes they are ambiguous 
or they mildly contradict each other. Such matters as the self checking 
mechanisms which would jam when too much mechanical error accumulated are hard 
to understand and the witers said they would not attempt a complete explanation. 
On the matter, of parallel arithmetic opera^iions they make several passing re- 
marks. We quoted Menabrea above about index calculations. At another point, 
in his' summary, whi-ch seems to indicate the importance of the idea, he is 
discussing the speed of the machine and says, "Likewise, when a long series 
of identical computations is to be performed, such as those required for the 
formation of numerical tfibles, the machine can be brought into play so as to 



give several results at the same time, which will greatly abridge the whole 

amount of the processes." This seems to be a clear statement of parallelism 

between arithmetic operations? 

Babbage and Lady Lovelace both discuss programming questions, but 

she exhibits her, own great insight in her notes on . the Menabrea paper, ^e 

vas quite concerned about languages for expressing programs. One was a kind 

of assembly language notation on large charts. These were translated from 

another notation very much like compiler assignment statements. All variables 

were denoted by where i indicates the storage location from 1 to 1000. To 

avoid the confusion of writing V^= ^^^+^2 introduced another index and wrote 

to indicate that the right hand side values were the mth and nth 

values to occupy their respective storage locations. Her machine level language 

was a kind of zero address operator language, although a separate operand stream 

was specified to the machine. Thus, to evaluate 
d*m-djr.* 



mn'-m*n 
dn'-d'n 



mn • -a • n 

she would use these three operation cards d(x), 3(-)* 2(4-) where commas 
separate the cards. Note that the common subexpression in the denominator 
is evaluated just once. Locations were supplied by a three address scheme 
using three variable cards, two for the arguments and one for the result. 

She finally suggests a loop notation using the 2 sign to denote 
loop control. She also allows for an index variable and nested loops! Her 
notes contain several quite complex programs but she and Babbage were not 
bothered by long programs. In fact they were both heartened by the fact that 
Babbage ovmed a Jacquard* tapestry ' which had required over 20,000 cards for 
its production. She does remark that from the standpoints of time required 
end ultimate accuracy, some numerical rQSults would be impossible to attain 
in any practical sense. 
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We noted above that during the course of the Difference Engine 
project^ Babbage had received 17^000 pounds from the Government. He had 
spent perhaps as much of his personal inheritance from his banker father. 
Thus, by the tijne he was deeply involved vith the Analytical Engine, sources 
of funds were scarce. Evidently Lady Lcv.elace and her husband were fairjy 
veil heeled and were both interested in horse racing as was Babbage. So at 
one point they devised betting procedures, evaluated them on the prototype 
Difference Engine, and lost a good deal of the Lovelace fortune. 

On another occasion Babbage studied game playing (including chess) 
on the Analytical Engine and designed a tic-tac-toe machine. He proposed to 
put several of them on the road with admission charges. Perhaps he had heard 
of Kalzel^s' "automatic chessplayer" which was revealed to contain a man. One 
is also reminded of Kalzel's collaboration i/ith Beethoven which resulted in 
'Vellington*s Victory" but no machine. In any case, Babbage dropped this plan. 

Viewed on the whole, Babbage 's life was a very interesting and creative 
one; his coir^uting activities fanned only one facet of his career. We conclude 
with a short discussion of some of his other interests. He carried on -a'^-life- 
long battle with street musicians - hauling them into court on several occasions. 
As a result, his home was the scene of frequent retaliatory concerts. Being 
much interested in the heart beat and respiratory rates of all animals, he 
took every opportunity in his travels to measure these. On one occasion he 
had himself sealed inside a 265^ F oven for about five minutes to study the 
effects on himself. Railroads, a new invention, were a great interest and • 
he is credited with many ideas including the invention of the first recording 
speedometer as well .as the first cov;catcher. A contribution of which he i/as 
very proud was a notation for describing the motion and "logic" of his mechanical 
drawings for liis Engines. Earlier in his life he and his Analytical Society friends 
had been instrumental in getting EInglish mathematicians to drop Newtonian notation 
for the calculus in favor of that of . 



Leibniz. We shall end this discussion with an abbreviated list of other writings 
and work: .an operations research tiype study of the post office system; meteoro- 
logical and tree ring observations, electricity and magnetism, a light house 
occulting system widely adopted, various other signaling schemes and a study 
which convinced him that the Analytical Engine could play chess with a "3 or more" 
move lookahead. * In short, while Babbage may occasionally ^a-ve been in error he was 
seldom at a loss for ideas about a subject. 

He was Lucasian Professor of l^thematics at Cambridge for nine years, 
but bitterly remarked that that was the only honor conferred on him by his own 
country. Eabbage's entire life was filled with the frustration of having few 
of his ideas appreciated and even fewer adopted. Toward the end of his life 
a friend noted, "He spoke as if he hated mankind in general,- Englishmen in 
particular, and the English Government and Organ Grinders most of all." In 
his book "The Exposition of I85I" he expressed his feelings quite clearly when 
he wrote, "Propose to any Englishman any principle or any instrument, however 
admirable, and you will observe that the whole effort of the English mind is 
directed to find a difficulty, a defect, or an impossibility in it. If , you 
speak to him of a machine for peeling a potato, he will . pronounce it impossible; 
if you peel a potato with it before his eyes, he will declare it useless because 
it will not slice a pineapple. Impart the same principle or show the same machine 
to cLn American or to one of our Colonists and you will observe that the whole ■ 
effort of his mind is to find some ne^; application of the principle, some new 
use for the instrument." ^^^^-^ iHlii obituary that 

he lived to be almost 80, "in spite of organ grinding persecutions. " 

Actually Babbage lived to see some small success.es for his ideas. In- 
spired by a published account of his Difference Engine, a Swedish printer, 
George Schcutz, and his son, Edward, built a machine. Scheutz spent a good 
deal of his o\m money and had some government support. In l85^ he exhibited 
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in England his fourth order^ eight, digit difference machine with a printing 
output mechanisKi. Babbage and his son received Scheutz warmly and after a 
good deal of publicity the machine was sold^to the Dudley Obsejvatory in Albany, 
New York. Whether or not it was much used seems. to be in question. In any 
case, a copy was .made in I863 and the British Government used it to compute 
actuarial tables for the newly emerging life insurance business - a topic on 
which Babbage had discoursed in earlier times. 

Babbage 's son, H. 'P. Babbage continued to work on the Analytical Engine 
and after his father's death managed to construct some working parts of the mill' 
between I88O and I9IO. At a demonstration this machine computed and printed a 
table of twenty digit multiples of 7r» 

In the 1880 's anojiher interesting forerunner of modern computer equip- 
ment was under -development. Working at the U. S. Patent Office, Herman Hollerith, 
an engineering graduate: of Columbia, constructed a punched card tabulating machine. 
By 1890, Hollerith machines were in use at the U. S. Census Bureau for processing 
returns of the I89O census. Hollerith later' went into business for himself, 
manufacturing a variety of card processing equipment. He was quite succeijsrul 
and as we shall see below, his company became a basic building block in the 
modern, computer industry. 
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C-T-R et seq . 

In 1892, young Thomas J. Watson launched his sales career on a horse 

dravn vagon^ peddling 'sewing machines, pianos, organs and caskets out of 

2 

Painted Post, New York. Before long he had moved to Buffalo, and Rochester 
and "became a star, salesman for the National Cash Register Company of Dayton, 
Ohio. His record having been observed by J. H. Patterson, the head of NCR, 
Watson was elevated to various positions and by 191^ was iroTe or less the num- . 
ber two man. at NCR, which by then was the largest cash register company in the 
.U. S. His position in the company and the company's position with respect to 
competition caused V^atson some difficulty. 

First, Patterson was a manager who ruled with an iron, if somewhat 
bizarre, hand. Kis executives had to engage in various Patterson designed 
regimens (e.g. prework group horseback riding and special foods) and were fired 
for various kinds of real or imagined insubordination. .Occasionally instead of 
firing someone Patterson would provide him with a "fresh start" by moving the 
entire contents of his office out on the front, lawn, dousing it with kerosene 
and touching a match to it. So, after almost twenty years with NCR. and the 
survivor of many earlier purges, Watson was fired by Patterson in 191^. 

The foremost market position of NCR was due in large part to Watson's 
efforts, but this was his second difficulty. Some months before his firing, 
a number of top management NCR people including Patterson and Watson had 
been taken to court for a number of illegal business practices. They had 
essentially eliminated all competition in the new and used cash register business 
by strops selling, price cutting, industrial espionage, personal harrassment and 
their ultimate weapon, the "knockout machine." This was a flimsy copy of a 
competitior' s machine which would be sold cheaply as the real thing and soon 
break dovm. Vfatson was at the time of his firing appealing a fine and one year 
jail sentence. In spite of this, Watson asked Charles R. Flint for a Job. 
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Flint was a New York tycoon, who had invested in practically everything, 
and in I91I had formed one of the early conglomerates of diverse product 
manufacturers — the Computer- Tabulator -Recording Company, otherwise known as 
C-T-R. This included a number of companies making equipment that could be 
called business machines, and included Herman Hollerith's ' Tabulating l*Iachine 
Company. Flint proposed Watson to the Board as manager of C-T-R, there 

were some raised eyebrows, but Flint prevailed. Later the Jail sentence and 
other litigation disappeared. Watson moved rather slowly at first, but became 
C-T-R president £Lnd by 1^2k was solidly in corimand. In 192^ he changed the name 
of the company to International Business l^achines. 

In many ways, Watson ran IBM as Patterson ran NCR. He was once re- 
ferred to as a "benevolent despot", but he was more rational and if not intellect- 
ually inclined, he did enjoy and have good Intuition about making money. IBM 
flourished and by the mid 1930 's Watson was the highest paid person in the U. S. 

Watson's interest in developing new products as a way to higher profit- 
ability caused him to support various new machine development activities within 
the company. He also enjoyed talking with people inside and outside IBM about 
possible uses of his equipment. Thus, when he was telephoned by a young edu- 
cation professor at Columbia, Benjamin D. Wood, in I928, Watson said he could 
spare an hour for a lunch meeting. The meeting went well and Watson stayed 
until 5*30 listening to the problems and ideas Wood presented. In short. Wood 
had been developing intelligence tests for college students and had ^3jOOO 
to process. With a room full of girls and S9me equipment he had designed, the 
processing of these tests was costing at least ^5*00 each. He explained how 
these tests and similar material could be processed for perhaps 10 or 20 cents 
using IBM equipment — perhaps with some modification. Two days later Wood had 
a room full of IBM equipment at his disposal, free of charge. His predictions 
were correct and he continued to offer suggestions to Watson including one that 



the mechanical parts should be eliminated in favor of all electrical equipment. 
This association led to a line of IBM equipment for education, and Wood remained 
an IBM consultant for many years. More important, the equipment attracted the 
attention of other Columbia faculty and students. An astronomy graduate student, 
Wallace Eckert, talked to Wood and Watson. This later led to another gif/i to 
Columbia, the T. J. Watson Astronomical Computing Bureau. One of Watson's top 
engineers, Clair D. Lake , built a special machine for the Bureau. It was the 
first machine which could multiply and it also had a sequencing mechanism. It 
was used for the computation of astronomical and navigational tables "-the latter 
were very important in antisubmarine warfare in" the North Atlantic in the late 
1930 's. Later, Eckert joined IJM as the first director of the T. J. Watson 
Laboratory which was located near the Columbia campus. 

Eckert 's earlier astronomy calculations had attracted a good deal of 
attention and among liis visitors were Harlow Shapley, astronomy professor at 
Harvard University and Jajnes B. Conant, the president of Harvard. Shapley dis- 
cussed the Columbia work with Howard Aiken who was teaching mathematics in 
Harvard's Graduate School of Engineering. Aiken had known about the state of 
the art in computing and had been thinking about building a more complex machine. 
Shapley prompted' Mken to visit Eckert at Columbia and later to discuss his 
ideas with James W. Bryce of IBM. Bryce had been one of IH-T's key inventors 
for thirty years and as a result of these discussions Watson put up a million 
dollars to build a machine for Aiken. 

Although, Watson had a reputation for occasionally trampling on every- 
one close to him — including the Columbia professors — Aiken had shopped around 
and found no one but IBM capable of building his machine (whose details will 
be discussed later). Aiken also had a^ strong personality. Watson apparently 
did not involve himself much with the .project until .the machine was finished. 
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At that point he decided it should be enclosed in a special glass and stain- 
less steel case; Aiken strongly disagreed. Watson won that round as he always 
had within the company. Watson had been honored by many organizations and nations 
and expected that his gift of a million dollar machine plus another $200,000 for 
operating it would bring out the" best in Harvard'. When Watson arrived at Harvard 
for the dedicati^on he found that it was Aiken and not Watson who was to get the 
credit for the machine. After raising a ruckus which included a threat to take 
the machine away, Watson was calmed down by President Conant who then made a 
speech at the dedication. 

Watson died in the mid-1950 's and was succeeded by his son as president 
of IBI^i. The company has continued to build punched card equipment and other 
machines . 
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Modern N'achine Eeninninf^s 
Three men ushered in the modern digital computer era in the- 1930 *s. 
They were Ho;;ard H. Aiken of Harvard University, George R. Stibitz of Bell 
Telephone Laboratories and Konrad Zuse of the Technische Hochschule in Berlin. 
Collectively they desisned and built a number of relay machines and by the 
19^0' s, each had completed a ^ieneral purpose programmable digital computer. 

m 

They all apparently worI:ed independently of one another, although Aiken used 
the engineering talent of IBl*' to "build his machine, in. particular three men vere his 
coinventor's : B. M. Durfee, F. E. Hamilton and C. D. Lake, who had designed a good 
deal of earlier IBM equipment. By 19^6, J. P. Eckert (no relation to Wallace Eckert) 
and J. \1. Kauchly of the Moore School of Electrical Engineering at the University 
of Pennsylvania had cuccessf-jJLly completed ElUAC, the first electronic digital 
cosqputer. This attracte;! t'ne attention of John von Keumann i;ho, as a consultant, 
with Eckert and Mauchly propoGed EDVAC , the first stored program computer. This 
design was modified and embellished by a .number of people and by 1950 there 
were more than a dozen big .r;achine projects under way. By 1950, so many of the 
ideas used in current maclu.nec had been proposed and experimented mth, that it 
will take us a good deal of space to outline some of the details. It is, of course, 
impossible to pin dovm v;ho had each idea first but we shall attempt a 'rough 
chronological ordering based on various published documents. 

Zus^ evidently be-cn first (he had his first ideas in 193^0 but his 
influence outside Germany was probably the smallest of the pioneers. Unfortunately, 
most of his early work was destroyed during the war. His special purpose relay 
machines Zl and Z2 were built between 193o 'and 19^0. Z3 was a general purpose 
machine which operated under e:-:ternal program control. It had a 6h word data 
memory and the numbers \rere of binarj"* floating point format: 22 bits with ik 
mantissa, 7 exponent and one si.rn bit. The machine contained 26C0 relays and 
was" built betv;een 193^ ^i^-d. I'^hl. During the war Zuse developed two special 
purpose control computers, one which continously sampled 100 points for process 
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control. Following the war, Zuse built and then went into business, consnercially 
manufacturing Z5 and subsequent machines. As we shall see, 'Stibitz wa's almost an 
exact American parallel of Zuse , although a few years behind him. 

At Bell Labs, Stibitz ljuilt his Model I or "complex computer" between 

k • ' ' ' ■ 

1938 and 19^0. It was not a programmable machine, it simply performed complex 

arithmetic on numbers presented via a teletype Iceyboard. Its main claim to fame 
is that Stibitz demonstated the first remote terminal system (keyboard and printer) 
to an American Mathematical Society meeting at Dartmouth in 19^0, using the machine 
which was in liew York City. 

Subsequently Bell Labs built several other relay machines, including 
an interpolator and a ballistic computer each of which had a few internal registers 
for data storage. Between IShk and .19^7 Stibitz and S. B. Williams built the 
Kodel V system which was a general purpose two processor machine. This machine 
contained 9OCO telephone relays and 50 pieces of teletype equipment occupying 
1000 square feet of floor space. The speeds of each processor were: 300 milli- 
second for addition, 1 second for multiplication, about 5 seconds for divide or 
square root, and .07 seconds for a register to register transfer. Earliejp^ Stibitz 
machines had used an excess three binary number system, but foi^'this machine Stibitz 
invented and used biquinary decimal numbers for several reasons. It made self 
checking, conversion to decimal, and implementation in relay circuits relatively 
easy. The numbers were floating point with seven decimal digits and an exponent' 
of magnitude less than 20. Each processor's internal memory was 15 relay registers. 
The entire system consisted of two such processors and three I/O positions, all 
interconnected. Each I/O position could handle a number of l/O devices. Thus 
one job' could use both processors or two separate jobs could be run together. 
Furthermore, the machines could, on completing one Job, switch to another l/O 
position. Thus, set up time by a human operator could be masked. Also the tape 
motion time to access a new job could be masked and by preparing a number of jobs 
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on several paper tapes the machine could "be run overnight, unattended. 

The machine was programmed using a simple three address symbolic 
language, taking advantage of the fact that the 15 registers vere*. named "by 
letters of the alphabet. Loops could be programmed by making paper tape loops 
With typical Bell System concern for reliability, the machine had various self 
checking featured and high reliability was achieved. The chief cause of diffi- 
culty was diity relay contacts. Various lamps would indicate to an operator where 
the difficulty was if the machine stopped. On an imattended run, the machine could 
^bort one job and proceed to try the next one if a fault occurred. Two of these 
machines were built, one for the National A'^visory Committee for Aeronautics 
(Langley Field, Va. ) and one for the Ordnance Department of the Army (Aberdeen 
Proving ' Ground, Maryland). 

Bell Telephone labs constructed a Model VI system in the late 19Uo*s 
which was installed at their Murray Hill, N. J. Laboratory. This machine was in 
several ways an improved version of the Model V. First, it had a number of re- 
mote tenninals- from which Jobs could be submitted to the machine via telephone 
lines. Second, when a job failed for some reason, the machine would automatically 
restart it and try once more. A sticky relay might work the second time. If 
not it would' go on to the next job as did Model V. These two features made the 
system appear to "be very much like a modern machine with a remote entiy batch 
processing operating system. 

Another interesting feature of Model VI was the ability to wire in 
subroutines. Provisions were made for up to 200 such subroutines. They could 
call each other and be nested dovm to four levels. Since the program was other- 
wise external paper tape, this speeded up the operation of the machine and 
made the programmer's life easier. 

Models V and VI were both "asynchronous" machines. That is, they had 
no controlling clock; when one. step of an operation was over it caused the next 
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step to becin. This design philosophy has been tried with varying success in some 
modern high speed machines* 

In contrast to the Bell Labs approach, Aiken and the IBM group 
designed a synchronous computer which was operated at a 300 millisecond cycle? 
This machine was designed and biiilt between 1937 and 19^^ • IBM became involved 
in 1939 and the vork from then until ^completion was carried ou^: in their facilities 
at Endicott, N. Y. The machine was operated at Harvard University, and was known 
either as the Automatic Sequence Controlled Calculator or the Harvard M?rk I. 
Mark I was 8 feet high, 51 feet long and 6 feet deep. It was a decimal, fixed 
point machine using a 23 digit plus sign, word. It could store 72 such words in 
10 position counter wheels and had an additional 6o number- storage facility in 
manually set dial positions (what would now be called a read only memory). Its 
speeds were add or subtract in 300 ms., multiply in 6 seconds, divide in 11. U 
seconds, and it could evaluate several special functions in about one minute. 
•These latter were so slow that faster, lower accuracy subroutines were often used. 
The machine could also perform double precision or half word operations. 

Instructions were externally stored on 2^ hole paper tape and were 
in two address fonnat. Initially it could conditionally jump to one of two ex- 
ternal tape routines based on the range of an argument. This was later changed 
to a branch to one of several tapes based on a more general transfer on minus 
instruction. 

Programming for maximum speed could present interesting challenges. 
All operations shared a main bus and during the exection of a long operation 
the programmer could initiate shorter commands such as addition or certain I/O 
operations. A hardvrare interlock prevented these "interposed operations" from 
conflicting with the longer ongoing operation. Evidently this technique was used 
a great deal. Mark I was the- first large scale machine to be completed and was 
first used to compute various tables and. later used to solve systems of algebrr.ic 
and differential equations. After it was broken in, Mark I was quite reliable, 
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reportedly available 95^ of the time in 1950/ and it was -in use for 15 years. 

While ve have gone over the period of early development in a very 
quick way, it is clear that spectacular progress was made. Zuse^ Stibitz and 
Aiken had broken ground for events that in the subsequent five years would yield 
the "modern" digital computer. While their mechanical realizations were great 
feats of engineetring, their ideas .were mainly rediscoveries of things that were 
well known to Babbage exactly 100 years earlier. For their implementations alone, 
however, they would have earned Babbage* s respect, as he wrote in "The Life of a 
Philosopher" in lQ6k, "If, unwarned by my example, any man shall undertake and 
shall succeed in really constructing an engine embodying in itself the whole of 
the executive department of mathematicaJL analysis upon different principles or 
by simpler mechanical means, I have no fear of leaving my reputation in his 
charge, for he alone will be fully able to appreciate the nature of my efforts 
and the value of their results." 
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The Second Wave 

The improve.T.ents introduced in the next wave of machines included electronic 

parts, large internal memories, stored procraras, index regi'sters,. and magnetic 

tape and dra-n secondary storage. By the early 1950's the typical i^j-chine could multipl, 

in a few L.i;.iii ? 5ccnds and had 10214- words of primary memory. We shall at.tempt to 

point out the most important steps in terms of the people who made them and the 

machines they built. 

In 19^3^ Mauchly and Eckert undertook the design of what turned out to be 

6 

one of the physically largest computers made before or after that time. ENIAC • 
was sponsored by the Army Ordnance Department and was intended to integrate 
ordinary differential eq,uations for the generation of ballistics tables. It 
was finished at the Moore School in February, 19^6. The machine was configured 
in a U-shape but overall it was about 100 feet" long and 8 1/2 feet high. It 
contained 18,000 vacuum tubes and 1500 relays and consumed 150 kw of power. Each 
register in the machine used 550 tubes and was about 2 feet wide and 8 1/2 feet . 
highl' In spite of its gargantuan dimensions the machine was very fast and quite 
reliable . 

ENIAC was a tsn digit fixed point decimal machine \rith a parallel arith- 
metic unit which performed at the following speeds: add in- 200 /is, multiply. in 
2.8 ras., and divide in 6 ms. It also had a square root unit and was capable of 
double precision operations. Its internal memory consisted of 20 registers, each 
of ten digits. It was able to do l/O and arithmetic simultaneously and had an 
800 card per minute reader. Nevertheless, computations were often l/O bound and 
while its raw speed was a factor of 1000 over Mark I its overall performance may 
have been closer to a speedup of two or three hundred. The machine was externally 
prograirj'.ed by attaching various portable "function tables" which would be arranged 
by the programmer. These externial tables could also be used as a read-only data 
memory. The machine was capable of conditional jumps although this feature 
evolved in time. The time to set up the machine x'or a particular calculation 
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ranged from a half hour to a day. In 19^*7 its "up time" was estimated to be 20% 
but by 1950, measured over a one month period; the hardware was available Q% of 
the time; when set up time and program hangups were included, 67^ utilization was 
measured. After completiQn, the machine was moved to the Aberdeen Proving Ground 
and various improvements were made. John von Neumann was instrumental in making 
the programming easier and faster via external boards, wires> and switches. 

Having been attracted by EITIAC, von Neumann became a consultant to the Moore 
School group and began _to study the question of machine design. In 19^^^ Eckert 
had written a memo suggesting the use of a magnetic drum or disk as the primary 
memory of a machine. The use of a variety of memories for radar systems had 
developed during World War II. Crawford had written a thesis at MIT in 19^2 
suggesting a magnetic disk or drum in this context, and a variety of acoustic 
delay line memories were- in use by radar people 6.t the time. 

In ISh^f von Neumann wrote a memo as an ENIAC consultant discussing a 
stored progrsLm machine. This important idea, due perhaps to Eckert, Maiichly and 
von Neumann, led to a new project to build EDVAC. This was to be a machine of 
much more modest size than -ENIAC, but with a larger internal memory and slightly 
slower arithmetic. While it spawned a great maiiy other machines and ideas', EDVAC 
was not the first- stored program machine to become operational. The "project was 
begun in 19^6 and the machine was not operational until 1952. During this period, 
Mauchly and Eckert left the Moore School to form their o\m computer company and 
von Neumann launched his own project at Princeton taking with him several other 
Moore School people. 

In any case, EDVAC was a binary, kh bit, fixed point machine with a bit 
serial arithmetic unit. Tliis required only 3500 tubes to achieve average speeds 
of 850 /iS for add, aiid 2.8 ms for multiply. It had a mercury delay line memory 
"htiich ccntained 102^]- words of data and program. This was organized as 128 delay 
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lines each containing 8 vords. This memory led the designers to choose a four 
address instruction format, two for arguments, one for result, and one for next 
instruction, since any of these could be anywhere in the 102k word circulating 
memory. The machine had two arithmetic units; the second used for checking the 
first. 

England Pulls Ahead 

Following a visit to the Moore School, Maurice V^ilkes of Cambridge 
University started a project at Cpjabridge at the end of 19^6. This led to 
EDSAC, the first stored program machine to be completed, in 19^9* EDSAC was 
quite similar in design to EDVAC although somewhat slower. It had a 1.5 ms add 
time, an average 6 ms multiply time and required a few hundred ms for division. 
Its memory characteristics were much like those* of EDVAC described above. The 
overall machine had about 3000 tubes and dissipated 15 kw. Wilkes was quite 
interested in questions concerning the programming and use of the machine. 
Among other things, he developed a large subroutine library for EDSAC users. 

Others had preceded Wilkes in England with thoughts about automatic 

computers. Alan M. Turing had published his famous paper in 195^ e-^cL J. R. 

Womersley at the National Physical Laboratory had begun to think about real 

machines in 19^5* 3y 19^7; Turing and others had joined him to begin a project 

which led to the construction of ACE, . the pilot model being completed in 1950. 

The Ace pilot had only about 1000 tubes but achieved an add time of 52 us .on 

32 bit words. Its small component count made it very reliable. Shortly after- 

the KPL activity began, the Telecommunication Research Establishment began to 

study the problem. This led to the development of VADIA at Manchester University, 

the project being moved there in early 19^7 with continuing support from the 

8 

Telecomiranications Research Establisliment. 

Delay lines had a rather long latency; since they operated at a few 
megacycles and contained several hundred bits, it could tiake a millisecond to 
access a word. Tlius a random access, large, cheap, memory device was sought. At 
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Kanchester^ F. C. Willianis developed the "Williams tube*' which filled this bill. 
Kis first tube worked in ISh^ and was used in a prototype machine by June of 
19^8. This was a cathode ray tube with bits stoiedon its face. They) could be 
capacitively sensed and access time was a function of electron beam switching 
and sensing times only. Thus, the first large random access memory was avail- 
able. 'In 19^^8, the Manchester group, which also included T. Kilburn, demonstrated 
a 2000 rpm, head per track, magnetic drum and used this as backup to V/illiams tube 
primary memories in 19^1-9 • 

Using this memory hierarchy, they issued I/O instruction for blocks 
of data from the drum and stole processor cycles to access the main memory, ^ey 
built another prototype in 19^9 that had an interesting new feature which they 
called the B-tube. Using the B-tube, th^ said, ".. .instructions, and in particular 
their address section, could be modified in their effect without being modified 
in their stored form." Thus appeared the world's first index register. . With 
these important innovations as background, they designed 1-!ADM in 19^9 and it 
was finished in 1951. This was a one address, binary machine with ^0 bit, 
fixed point operations. Its arithmetic speeds were: addition in 1.2 ms and multi- 
plication in 2.16 ms. 1600' pentodes and 2,000 diodes were used. The Williams 
tube memory consisted of 512 words stored in 8 tubes, together with a 150,000 
bit dnim. 

We remarked earlier that magnetic recording on disks or drums had been 
suggested at least as early as 19^2. The first successful machine to use a 
magnetic drum was built in 19^7 ^y A« !>• Booth at the University of London. 
It was called' SEC and had 256 words of 21 bits. The arithmetic unit employed 
orJ.y 230 tubes £ind had a 1.6 ms add time. 
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Meanwhile, Back .at Princeton 
Just a year after his EDVAC report, von Neumann and two co-wofkers, 
Arthur W. Burks and Hersian H. Goldstine, published another report? This was 
June, 19^6 and they were all at the Institute for Advanced Study (lAS) at 
Princeton Univeraityj Burks and Goldstine had both been at the Moore School 
for some time and had been involved with ENIAC. Their new report was entitled 
"Preliminary Discussion* of the Logical Design of an Electronic Computing 
Instrument," and it was a detailed, clearly argued discussion of many details 
of machine design. In 19^7 Goldstine and von Neumann wrote an accompanying 
docujTient ' on the analysis and coding of problems for the machine. These 
documents led to the construction of the IAS machine which was completed in 
1952. Julian H. Bigelow was the chief engineer in charge of the IAS machine. 
This project becarie the focal point of computing activities in the U. S. The 
project was Aindcd by the Army Ordnajice Department, with contributions from 
the. Air Force, the Office of Naval Research and the Atomic Energy Commission. 

The IAS machine was completed in June, 1952" and was a rather compact 
unit; excluding the l/O gear its dimensions were 8x8X2 feet. It contained 
2500 tubes (many. double triodes) and hO Uilliams tubes each containing 102^1- bits. 

Thus the memory contained 102^1-, kO bit words each being interpreted as one 
fixed point number or two instructions. The machine had a one address order 
code with 10 bits of address per instruction. The memory access time was about 
25 us and excluding this, the average arithmetic times were: 15 us for addition, 
400 /is for multiplication and 1 ms for division. Many engineering innovations were 
included; among them a word parallel memory access feature not included in the 
l^anchester machines. The arithmetic unit also operated in parallel and the machine 
was asynchronous. 
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This machine and project were quite important from several standpoints. 
First, the excellent engineers who* built the machine had a number of I'ather good 
recent inventions to use. Second, von Neumann and his staff thought very 
imaginatively and broadly about how to use the machine. Finally, their reports 
and visitors caused this machine's reputation to be widely known. A number of 
copies of the machine were built. 
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In parallel with the IAS activity^ the Servomechanisms Laboratory of MIT 
.began to build a machine. One original motivation was the problem of real time 

aircraft simulation. The Whirlwind I project began in 19^+7 under Office of Naval 

Research sponsorship and was directed by Jay W. Forrester. Very high speeds were 

achieved in the 15 bit (plus sign) parallel, * fixed point arithmetic unit: add in 

8 MS, multiply in 2k ^s. V/hen memory fetch time was included, both operations 

averaged 180 us, VThirlwind was a synchronous machine with a 2 megacycle clock for 

the arithmetic unit. It was also a stored program machine. The machine was 

operational in 1951* 

One important outcome of " the l-IIT activity \r&s in the primary memory area. 
Initially, l-niirlwihd had a 102k word, l6 bit, modified Williams tube memory. 
Under Forrester's direction, alternative memory devices were being studied. The 
l-ECT group was in close competition with an RCA team headed by Jan Rajchman. At 
least by virtue of consent decrees some ten years later, MET won the race. 
(The settlement included royalty- free rights to RCA and a $13 million license . 
from to IBM.) In 1953 they had installed in Vfliirlwind a 20k8 word coincident, 
current magnetic core memory. This memory had a 1 /is read time and an 8 us 'vorite 
and cycle time and the cores were about 80 mils OD. The machine also had a 
cathode ray tube for output display with a computer controlled camera atta^ihed. 

Thus by 1953^ VThirlwind I \dth its core memory, and the IAS ma'chine were 
both in operation. These two machines are regarded by many people as the first 
of the "modern" digital computers. They had combined some ten years of engineering 
development by a number of other groups together with their ov?n inventions and 
excellent engineering. The influence of these machines was widely felt in both 
university projects and the newly emerging electronic computing industry. 

A New Industry Be.n:ins 

We mentioned earlier that one of the reasons that EDVAC was not completed 
earlier may have been the departures of von Neumann and his people to the IAS 
project as well as Eckert and Mauchly to form their own company. In December, ISkJ 
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the Eckert-Mauchly Computer Corporation was founded with financial backing 
from a multimillionaire. The firm designed and built BIMC for Northrop Air- 
craft under an Air Force contract. It was a^i EDVAC-like machine with a delay- 
line memory and about a one millisecond arithmetic speed. BINAC was demonstrated 
in August, 19^9. 

At the time, their only commercial competition was from IBM which was 
selling various combination ..electronic and electromechanical devices. These 
included the Selective Sequence Electronic Calculator (SSEC), the 6ok 
•Electronic Calculating Punch, and the Card Prograjnmed Calculator (CPC) all 
introduced in 19^8. The CPC actually grew out of an experiment in which a 60k 
and an accounting machine were Joined by people at Northrop. None of these was 
a stored program machine, and it looked as if the Eckert-Mauchly Corporation had 
a clear field. Eased on their BINAC experience they designed a new machine, 
UNIVAC, and began taking orders at 5:250,000 per system. 

At that point their fortune changed. Their financial backer was killed 
in an airplane * crtish at about the time they realized that the $250,000 UNIVAC 
price tag was too low to make a profit. Seeking funds they talked vrLth pc^ople 
at the T. J. Watson Laboratory in New York. The technical people there were 
enthusiastic about UNIVAC but evidently on Watson's decision, the Eckert-Mauchly 
talks were terminated. James Rajid of Remington Rand then discussed the matter 
with Eckert and Mauchly and subsequently took over their company. 

At the time. Remington Rand had a line of desk calculators as well as 
various punched card equipment. Unlike IBM, Remington Rand used a 6 row, 90 
column card. \NTiile IBM equipment had been primarily designed for "business 
applicat'ions" it had "found its way into many "scientific" uses. Remington 
Rand equipment seems to have retained the flavor of "business equipment" only, 
at that time. 
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The first UNIVAC was delivered to the Bureau of the Census in June of 
19 51* UNI\^AC was a synchronous • machine and had a delay line memory of 1000 
(not 1021;-) words of 12 decimal digits. The serial arithmetic unit operated at 
about 1 millisecond and the numbers were binary coded decimal in excess three 
format. Magnetic t^pes were used as secondary memory and special buffer registers 
were provided for data entry to primary memory. UNIVAC was quite successful and 
kQ systems were built (sale price was ^.7 50 although they were also leased). 

In 1952 Remington Rand bought out Engineering Research Associates of 
Minneapolis. ERA had been a pioneer in commercial magnetic drum manufacture and 
had designed their 1101 and 1102 computers around their drum. The UTOTAC name 
had numbers attached to it for later Remington Rand machines and still later the 
1100 numbering scheme was resurrected. 

IK4 finally saw the light and in 1950 began a project which led to the 
IM 701 by the end of 1952. ' The 701 was a -36 bit fixed point, synchronous, parallel 
machine with a 20kQ word Williams tube memory. Its speed was about kO us for 
addition and kOO iis for multiplication or division. This was the beginning of 
a long series of 700 and 7OOO • series machines . It also signalled the end of 
the open field for Remington Rand. With Watson* s aggressive sales background 
and "v.ddely established sales network, IBM quickly moved in. Eventually nijieteen 
701 systems were sold and many other machines followed. 

Thus by 1955 — just nine years after the completion of I-^ark I— 
^'Jhirlwind I and the IAS machine were leading the research front and UNIVAC I 
and the IBM 701 were both commercially available. In 1970 there are some 70 
companies in the business of computer manufacturing. IBM has about 7O/0 of the 
market and its nearest competitor, Honeywell ;d.th its newly purchased GE division, 
has about 



\Je will close this Chapter .with a synopsis of the history of machine 
organization up to 1953 and a few remarks about what followed. At this point 
the reader has surely noticed that a large fraction of the "big ideas" of modem 
machines were in use by 1953 • In fact a good many of them were thought about 
by Babbage, 100 years earlier. Babbage had proposed a machine organizati9n 
with a memory, arithmetic unit, control unit and l/O facilities. He invented 
a parallel arithiTietic unit with anticipatory carry logic and an overflow alarm. 
He also used an index register for loop counting and it worked in parallel with 
the arithmetic unit-. Between them, Babbage and Lady Lovelace proposed a good 
many pro-^ran^riing ideas which were similar to those in current use. Unfortunately, 
they were a hundred years ahead of the technology. 

In fact the vacjium tube and Eccles- Jordan flip-flop circuit were both 
invented in the first quarter of the 20th century but were not employed iintil 
25 years later in EjMIAC After the feasibility of large general purpose com- 
puters had been demonstrated using electric relay and mechanical technology, 
the events of 'World War II caused the US and British governments to provide 
the funds for a good deal of computer research and development. The esirller 
radar efforts certainly provided many engineering and technology ideas. 

By 1953/ most of what Babba{;e had proposed was implemented. Machine 
speed was the main thing that would have surprised Babbage. He proposed a one 
second add and a one minute multiply. In fact several tens of microseconds 
were all that addition required and multiplication was about an order of 
magnitude slower. The clever memory hierarchy ideas of the >Ianchester group 
as well as the notion of a stored program would have impressed, if not surprised, 
Babbage*. 

The computer scientist of 1970 should give pause to notice the wealth 
of innovations which had been demonstrated by 1953* The multiprocessor with 
remote ^ob entry at Bell Labs, • the 8 us core memory at MIT, the proposal of 
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aicroprograiming by Wilkes in 1951 — of these so\ind like ciirrent sub;jects. 

Many topics had been sharply debated in the 19^^-0 ' s, including synchronous 
vs. asynchronous operation, bit serial vs.vord parallel arithmetic, decimal vs. 
binary and fixed vs. floating point number representation. Several of these 
subjects are still debated — or "settled" by providing both. It should b'fe 
noted that asynchronous operation as pioneered by Stibitz and followed through 
the IAS machine, has largely disappeared. The extra control hardware and time 
required for "reply backs" between elementary operations became unreasonable as 
machine speeds increased. If is also interesting to note that while early machines 
(Zuse and Stibitz) had floating point hardware, it had largely disappeared by 
1955 — not to return for several years, von Neumann had been instrumental in 
this, arguing that proper, scaling was easy if one sufficiently understood his 
problem; othervdse he shouldn't be computing in the first place. His argument 
contained one^ genuinely unfortunate flaw -•- few users since have understood 
their calculations as von Neumann understood his. In any case, the "philosophy 
of machine design" papers written in the 19^0 's often read in part as if they 
had been written last year. 

Not that all ideas had been proposed by 1953* Some inventions big and small 
that came after 1953 will close this chapter. The transistor and integrated cir- 
cuit certainly provided the biggest technology changes and \rith them came re- 
markable system speedups. Memories with e:ctra tag bits, indirect addressing, 
and phased or interleaved banks were to follow as was modern pa^'^ing hardware. 
This led to complex multiprogramming and time sharing systems. Fancy terminals 
have greatly aided some users. Faster arithmetic algorithms and pipelined 
arithmetic units as well as program look ahead have contributed to faster com- 
putation. Stack machines have led to a variation in addressing as well as fast 
compilation. As we said at the beginning, things have become much more com- 
plicated and hardware and software orgejiization have become deeply'* intertwined. 
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In 195^/ software was in a rather sjjnple and pure state. Symbolic assemblers were 
co:rjr.on and hiL;h level Icai^^ia^cs were being discussed. Fortunately no one had 
thoufiht about software operating systems. 
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FOOTTTOTES 



1. [11] contains most of the available Babbage references. Also [5] contains 

a fair amount about Babbage. 

2. Mach of this material vas obtained from [lk]< 

3» ■[ 1]; pages 359 and. 3^7; contains accounts of the work of Zuse. In [9] on 
• pages 508 end 650 one can read further details including an article 
by Zuse himseljC. 

^: [ 2], pages 1 and 69, contains articles about the activities at Bell Labs. 

[12], page kl, is a very good discussion of the Bell Labs Machine. On page 91 
of [12] there is an interesting philosophical paper by Stibitz. 

5. [12] is a complete description of the machine. 

6. EJIAC is discussed in [12] page 31 cmd [1] page 97- 

7. EDSAC is discussed in -[3]* also in [I8] by Wilkes who was the designer of 
EDSAC. 

8. 14ALM and its preceding developments are discussed by the designers in. [5] 
page 117. . 

9. A complete discussion of the IAS machine design is contained in [7]» 

10. See [16] for a discussion of VJhirlvind. 
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Chapter 2 
Processor Design 



2.1.1 Overall Desig^n Questions 

A computer system designer must solve one problem in many forms 
and at many levels: What is the least expensive -way to provide a given 
function to the user of the machine? The function may be a low level 
detail or it may be an overall system characteristic. The function may 
be of an entirely logical nature or it may involve the speed with which 
something is accQmp3.ished. The function may be stated in terms of a 
user's problem or in terms of the computer itself. The possible functional 
specifications are endless so let us turn to the question of cost. Except 
in rare cases the designer must do things as cheaply as possible, subject 
to the functional ' constraints specified for the system. Cost savings may 
be mads by using less expensive parts and by reducing the number of parts. 
This often im'olves a number of trade offs, particularly because cheaper 
parts are usually slower and judiciously adding more parts generally speeds 
things up. Since overall costs are usually all that natter, cost and speed 
trade offs may be made between various units of the overall system. In 
any case, it is usually bad practice to include features merely because they 
are exotic (although some machines may appear to contradict this). Fiinctions 
should be of justifiable use to the customer and the overall cost should be 
as low as possible. 

Given these rather obvioiis remarks, the question rema?lnsi how does 
one go about designing a computer system? We shall attempt to answer that 
question in a fairly general way by discussing a ninober of computer functions 



and how they are interrelated. We shall attempt to discuss general 

« 

principles and then relate trtera to some real machines. Our overall 
approach -will be from the inside out; we shall start with the arith- 
metic unit, primary memory and the control \uiit. These will be followed 
by overall system discussions. 

2.1.2 Arithmetic and Lof^ical Unit 

If we regard consideration of the arithmetic and logical unit 
as the first design problem, then a number of decisions at this level will 
be reflected throughout the, system. In practice the various parts of the 
machine affected would be considered simultaneously. Here we shall 
restrict our attention to one part at a time. 

In terms of cost and speed we must concern ourselves with the 
kind of circuits used as well as how many parts are required. Circuit 
parameters of interest are 'switching speed, fan-in and fan-out limitations, 
power d-issipation, noise immunity, reliability and cost. Interrelations 
between parts required by the functions desired must be compatible with 
such things as layout on boards with respect to number of wiring levels, 
cross talk, cooling, and repairability. 

The user requirements specified may be rather vague. Problems 
in different contexts tend to place a variety of demands on the arithmetic 
unit — some problems requiring one thing and others something else. In any 
case, large numerical computation tends to be the most severe burden for the 
arithmetic unit so we shall discuss some details of this. 

First, one must decide which operations are to be performed. 
Addition, subtraction, multiplication and division are typical, although 
there -are many variations on these. In the future, more complicated functions 
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may be built into machines e.g. trigonometric functions, log, exp, • or 
n-ary summation. Initially ve shall restrict our attention to addition. 

First we must decide if we are going to actually add or just 
do a table lookup to get the result. This latter strategy has sometimes 
been employed (cf. IBM 1620) in slow, small machines; large, high speed 
machines usually have the ability to compute the sum of two numbers 
using some kind of sequential logic. The form of the numbers turns out 
to be quite important in a number of respects. By form we mean the 
number of digits, the niunber system and whether or not some kind of 
explicit exponent is used. 

The number of digits dictates the word length of memory as well 
as the arithmetic unit and its registers. This can be quite important in 
terms of overall- system cost. ' Users can often give estimates of the 
required word length in terms of the maximum round-off error tolerable for 
certain calculations. It is usually desirable to make the word length a 
multiple of the character size (byte) used in the computer system and this 
has usually been either 6 or 8 bits in binary machines. In the early days 
some internal decimal machines were built (e.g. IBM 65O) although 
these are quite rare now and we" shall restrict our attention to internal 
binary machines. For numerical calculating the range of 32 to 6k bits has 
been common. The possibility exists of choosing a standard word length and 
then providing arithmetic operations on double or half words. This has 
often been done to try to satisfy a wider class of users. 

Choosing a word length typically requires choosing both an exponent 
and fraction size in most modern mchines used for numerical computation, 
von Neumann urgued against floating point hardware and built a I^-O-bit fixed 



- k - 



point machine. Later, most companies built floating point machines with 
kO or fewer total, bits to the chagrin of many numerical uses, Typieally, 
fVom 6 to 16 bits of exponent are provided, 'as more complex numerical 
computations are performed, users are less happy with normalized 
arithmetic. Several unnormalized or significance arithmetic schemes have 
been proposed and implemented. 

Finally, the number system chosen can greatly influence the speed 
and gate count of the arithmetic unit. The well-known polynomial representation 
is commonly used, although a redundant form of this ,has quite desirable 
properties. Also the residue number system has interesting properties which 
are useful in a theoretical way as well as for some applications. 

We shall attempt to discuss several of these issues and to contrast 
some of them with others. The reader should be forwarned that no pat answers 
are forthcoming. Some fairly detailed results are available but the choices 
between alternatives must be dictated by individual design requirements. 

2.3 Nuniber Systems 

2.3.1 Polynomial Numbers 

Numbers may be coded in a variety of ways. For example, the polynomial 

manber k-1 

p(r,k) = 2 d.r , ^ < < ^> 

i=0 ^ ^ 

represents a k digit integer with radix r. If r = 10 we have a 

decimal number, e.g. 

p(lo,ii.) = 3 X 10^ + 7 X 10^ + 1 X 10-^ + 9 X 10° = (5Tl9)j_o. 

We shall use the radix subscript notation when necessary to avoid ambiguity. 
•As another example if r = 2 we have a binary number, e.g. 



p(2,3) = 1x2^ + 0x2^ + 1x2^= (lODg = (5)^Q. 
Finally, if r = l6 we have a hexadecimal number, 6.g, 

p(l6,5) = 1 X 16^ + 9 X 16^ + 15^0^ 

To avoid confusion, the substitutions A = 10^^, B = ^i^r C = ^jjO' ^ ~ ''"'^lO' 

E = li--LQ* F = 15^Q are often used. Thus for our example, (19F)^^ = (ifl5)^Q = 

(0001,1001,1111)2. Since four hits (binary digits) can be used to represent 
the 16 possible coefficients required in a hexadecimal number, an easy 

conversion froiri binary to hexadecimal may be made. In the last example 
this can be seen by simp3.y reading off groups of four bits in the binary 
form and reiTriting them as hexadecimal coefficients of appropriate powers 
of 16. 

Because of the ease of building physical devices with two stable 
internal states, a radix of some pa\^er of t\To is often chosen for computer 
number representation. Binary, octal, and hexadecimal are common choices. 

The above numbers were all integers, but real numbers are easy to 
write as polynomials by letting the summation range over negative as well 
as non-negative values. Thus 

k-1 

p(r,k,o) = r. d^r^, 0 < d^ < r 
i=-3 

is a j + k digit real number, base r. 

Examples of decimal and binary real numbers are 

p(lO,5,2) = 9 X 10^ + 0 X lO-*- + h X 10^ + 7 X 10"-^ + 3 X 10"^ 
= 90i^.T3 

and 

p(2,2,3) = 1 X 2-^ + 0 X 2° + 1 X 2"*-^ + 0 X 2'^ + 1 X 2"^ = (lO.lOl)^ 
= (2.625)3^^. 

Note that they have j digits after the decjmal and binary point, respectively. 
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2. 3*2 Signed Dif;it Numbers 

The polynomial numbers of the last section used non-negative 
digits, only. Tliere are good reasons to allow each digit to have its own 
sign, as we shall see in a later discussion of arithmetic operations. Many 
possible definitions of signed digit numbers could be given, but we choose 
the following. 

A signed digit polynomial number is given by 
■ . k-1 

sp(r, j,k,max|d. I) = Z d.r , -max|d. | < d. <max|d. | 

i=-j ^ ^ ^ " ^ 

where r > 2, 

and 

^ < max|d^| < r - 1, if r is odd, 

^ + 1 < raax|d. I < r - 1, if r is even. For example, if we choose r = 10 and 

" ^ 2 . 

let maxld. | = § + 1 = 6, we have sp(lO,2,5,6) = Z d 10 , -6 < d < 6 = 
X d i=-2 

?xlo^ + (-6)xio^ + (-5)xio^ + 6x10"^ + 2x10"^ = 500 - 6o - 5 + -6 + .2 = 257.62. 

The sign of such a number is the sign of the highest power nonzero 
digit, and negative numbers are formed by changing the sign of each digit. 
Thus no explicit sign is required. Note that the algebraic value of a signed 
digit polynomial number is zero if and only if all d^ = 0. 

2.5-3 Eesidue Niuiibers 

The residue number system uses an implicit definition for each 
number rather than the esq^licit polynomials of the previous sections. Before 
discussing this system we will review a few definitions. 

We say that a is congruent to r, modulo m, and iTrite 

a s r(mod m) 
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if for integers a, m and r there is an integer k such that 
a = r ■+ mk. 

In this congruence relation, r is called the residue and m the modulus of 

the number a. We shall concern ourselves only with the least positive 

residue r , defined by 0 < r < m. Thus if m = 2 and a = 5 "we have 
iP - iP 

5=5 (mod 2), k = 0 

5 s 3 (mod 2), k = 1 

5 s 1 (mod 2), k = 2 

5 = -1 (mod 2), k = 5 

and r^p = 1. Clearly r^^ is unique for any a and m. Finally we recaU. that 

two integers are relatively prime if their greatest common divisor is 1, 

The residue number system represents an integer as a concatenation 
of the least positive residues of that integer with respect to a set of re- 
latively prime moduli. For example, let 2 and 3 he the moduli, then we can 
represent the integers 0, ...,5 as follows: 



N r (mod 2) r (mod -3) Residue nuniber N 

£P iP 

0 0 0 00 

11 1 11 

2 0 2 02 

3 1 0 10 
k 0 .1 01 
5 1 2 12 



k 

In general, if we are using k moduli m^,...,m,, then we can represent ir m. 

^ ^ i=l ^ 

distinct numhers in the residue niimber system. If we had chosen moduli that 

were not relatively prime this would not be true. 
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2»k Machine Representation of Nuirib.ers 

Precision and Machine Radix 

A great variety of formats have been proposed and used to store 
and operate on numbers inside computers. If machine numbers have a word 
length of w digits (excluding sign), we say that integers may be represented 
with w digits of precision . It is important to distinguish "precision" in 
this sense from the meanings of such words as "accuracy" or "significance". 
Thus numbers may be represented to 20 digits of precision. But if the 
measuring device from which they were obtained was only accurate to 3 digits, 
only 3 digits of the 20 are accurate. The other 17 may have been "extrapolated" 
by a meter reader. 

We now consider the meaning of the word "radix" in computer terms. 
A machine is built of elements, . each having a number of different Internal 
states. Let us say that each element can represent v values . Almost all 
current machines are built using physical devices -vdth two stable states. These 
may be assembled into v value elements with v = 2 or with some other value of 
V, say V = 10. Thus, while most machines are made from two state physical devices, 
a number of current machines have binary (v=2) as well as decimal (v=10) 
arithmetic capabilities. When we wish to clearly denote a machine radix in terms 
of V value elements we shall ^-rrite r^ Instead of r. 

2. If. 2 Fixed and Floatino; Point Numbers • 

Integers are stored in most binary machines as a sign bit and w 
digits of integer. Thus, the range of w digit integers in a radix r^machine 
is 

w . / \ w 
-r < i(r ,w) < r 

Y V V 

with both plus and minus zero included. 

Numbers in this form need not bo regarded as integers. Obviously the 
radix point may be assuiaed to be any^>rhore in the number. Or it may be assumed 
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to be a fixed number of zeros to the right or left of the "viord. Wherever it 
is assumed to be, it is fixed by the programmer (as in slide rule computation). 
This number representation in computers is thus called either integer or fixed 
point form. A fixed point nmber which is not *an integer clearly has the range 

w+s ^. / \ w+s 

-r < fi(r ,w, s) < r 

where s (a signed integer) is a scale factor assumed by the user. 

Since the late 1950 's most big machines have provided arithmetic 

units which operate on integer as well as floating point or real number forms. 

Such forms usually represent a signed fraction and a signed exponent. Assume 

two signs plus w = e + f digits are used, where e is .the number of digits 

of the exponent and f is the number of digits of the fraction. Suppose we have 

a machine "VTith radix r^. An exponent e^^ of e digits and a fraction f of f 

digits may take the forms 

®1 ^^(V^'^1^ 
f^ = fi(r^,f,S2). 

In most machines s^ = 0 so exponents are regarded as integers and s^ = -f since 

the radix point is assumed to be at the left end of each word. 

Thus . e_ = i(r ,e) 
1 ^ y' 

and f^^ = fi(r^,f,-f). 

are assuinptnons that we shall, make unless otherwise noted in our subsequent 
discussion. 

Up to this point we have discussed forms of the fraction and exponent 
but we have not mentioned the base to which the exponent ^s raised. This is 
often referred to as the radix of machine numbers by users and we shall denote it 
by r^. ' In many machines r^ r^ = v, but this is not always so. It is also popular 

to choose r^ = r^ for some small integer k. For example in the IM 5^0 floating', point 
b V 
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operations v - = 2, but = 2 = 15, and it is referred to as a hexa- 

decimal floating point machine. If r^ = r^ then our distinction between 

radicies may seem pedantic because collections of k digits in r^ may be 

regarded as digits in r^ (recall our earlier discussion of conversion from 

binary to hexadecimal). 

Now we can express floating point machine numbers (with s^ = 0 

and Sg = f as above) as 

^ -(r^-1) (r ^-1) 

V ^ ^b ^ fi(r^,r^,e,f) < 1 x r^. ."^ 

or fi(r^,r,^,e,f) = + 0 



or 



(r^-1) f -(r ^-1) 



Note that the intervals represented contain only a finite number of reals,. 
¥e shall adopt the notation that if r^ = r^, both "vrill be denoted by r. For 

example if r = 2 we have 

2(l-f-2^) ^ f^(2,e^f) < 2(2^-1) 
or " f£(2,e,f) = + 0 

or -2^^^"^^ < fi(2,e,f) < -2^^"^"^^^. 

We say that the precision of a floating point number is determined 
by f and its range is determined by r^ and e. 

2.h.'5 Normalized and IJnnormalized Numbers 

The computer stored form of a floating poi'nt number is not necessarily 
•unique. Thus we -have 
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fi(2,e,f) f . X 2 



-2^X2 
= fi(2,e,f) 

if l|a+e^||< e,||f^||< f, and ||2"°'f^||< f, where ||x|| is the number of digits in x. 

For exaniple, if e = 3 and f = 5 

fi(lO,5,5) =.03210 X 10°°^ 
= .32100 X 10^^ 
= . 00321.x 10^^. 

To DflLke the stored form of floating point numbers unique, some standard form 
may be chosen. Very often this is the normalized form of a number which we 
s hall denote by nf£(r,e,f ). This, means that if the number being represented 
is non zero, the first digit to the right of the radix point is non zero. 
By properly adjusting the exponent, any non-zero floating point number can be 
normalized as we did above using an adjustment factor a. If the radix point is 
assumed to be at the left end of the fraction, then clearly we obtain Ttia-iHTTrn Tn 
precision for fractions using normalized forms. 

It is not always the case that users want to do normalized floating 
point calculations. Hardware and software aids for performing unnormalized or 
significance arithmetic are often provided. In this case some adjustment a 
is used so that the normalized number is shifted a digits to the right. 
Roughly speaking such a number may be said to have a significance of f-a 
digits. The point of providing significance arithmetic is that often the user 
starts out with numbers of less significance than the f available on his 
machine. Also the significance of all his nimibers is usually not the same. 
In such cases it may be very misleading to compute using the full f digits 
of the machine and to deliver an f digit result. Rather, the significance of 
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the result should be expressed as a. function of the significance of the input 
data. Machines with significance arithmetic features provide proper adjustment 
for each operation. 

2,k.k Multiple Precision Representation 

Whatever word length is provided by machine designers will prove 
inadequate for some users. Thus multiple precision hardware or software is 
often built. If n word -precision is provided, then n memory locations must be 
fetched per operand. In multiple precision floating point operations it may 
seem desirable to use an exponent of the same size as that used for single 
precision. However it is often the case that only an f digit arithmetic unit 
is available. Thus, each word in the multiple precision representation is 
used as an f, e pair. The exponents are then adjusted to reflect the position 
of each component in the longer number. 
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2.5-1 Floating Point Arithmetic Definitions 



The following definitions should be intuitively clear. 




if e, > e, 



if < eg 



f£^(r,e,f) / fig(r,e,f) = (fT^/f^) X r 



A number of difficulties may arise in terms of machine representation of the results 
of these arithmetic operations. In the case of add or subtract the exponent of the 
result is the same as one of the original exponents but one of argiiments must be 
shifted (adjusted) a distance equal to the magnitude of the difference of "the 
exponents. This can cause digits to flow off the right end of a number or machine 
register and we shall call it fraction underflow . In case both fractions have a 
high order 1, 1 is progagated off the left end of the number or machine register 
and we shall call this fraction overflow. 



substracted, respectively. In case a positive exponent gets too large we shall 
refer to it exponent overflow, and exponent underflow -^-dll mean that a negative 
exponent exceeds the e digits in magnitude. 



alarm to the user. Provisions should be provided to allow him to take appropriate 
action. With most modem compilers and operating systems, the actions can be 
taken automatically. For example, certain values should be saved for the user to 
study and the job may or may not be continued. 



In the case of multiplication and division the exponents are added and 



These various kinds of exceptions should always be used to trigger an 



2.5*2 Machine Addition 

First let us restrict our attention to addition of nonnegative 
integers. For example 

k-1 . ^ i 
n-(r,k) + n (r,k) = 2 d .r + Z cL r 

i=0 i=0 ^ 

k-1 ' 
= L (d +d )r-^. 
i=0 

Since each digit is required to "be less than r, d^ + d^^ must be regarded as 

a pair of digits, commonly called a sum and carry digit, so d^ + d^^ = 

rc. n + s. Tr/here c. - 1 if d_ . + > r, othervrise c.^^ = 0. 
1+1 1 1+1 ±1 ^1 — • 1+1 

It \<rLll become important below to decide precisely what we mean by "the addition 

of two numbers". Is it sufficient to generate only the c^_^^ and s^ digits for 

all i? Or must we worry about propagating the carry across the result? Note that 

3l6 

569 

can be evaluated with all zero carry bits whereas 
316 
+ 6S)k 
1010 

requires a carry to propagate across all positions. Apparently the latter 
process is much slo-;er than the former. On the other hand, the generation of 
^i+1 ^i ^^^^ position would seem to take the same time for each position 
and if these could be saved for the next addition perhaps an overall time saving 
would be possible. We shall return to these questions later. 
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Note that the second sum above overflowed in the sense that two three 

« 

digit numbers led to a four-digit one. With a fixed word length machine this 
would require the raising of some kind of alarm to cause appropriate action to 
be taken. 

Now let us consider the addition of nonncgative .floating point 
numbers. For ex.'iniple 

= 101. X 2^^ + on. X (=5,0 X h^^ + X 2^0 = 26 J 

= 101. X 2-^^ + 001.1 X 2-^^ 

= 110.1 X 2^^ = 6.5iQ X \q = 26^Q. 

This addition process required an extra step before the addition to equalize 
the exponents and align the binary points of the two arguiaents. In particular, 
the smaller exponent was set equal to the larger one and the fractional part 
was properly shifted to compensate. ITotice that the process underf lowed the three 
digits allovred for the fraction part of t}:e floating point numbers. 
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2. 5*3 Normalized Floating Point Addition 

We ■will now consider the process of floating point addition assuming 
normalized arguments. Let 

nf£^(2,3,5) = 10100 X 2°-^ 

nf £2(2,3,5) = 10100 X 2°°^ 

The addition process requires equal exponents so we must first align the 
fractions and ad.iust the smaller exponent. Thus we have 

nf £^ H: nf = 10100 X 2°-^ + 00101 x 2°-^ 

= 11001 X 2°"^ = nfi^. 

In general we may overflow the left end (by at most one digit) and this requires 
a post addition adjustment or renormalization step. This may cause a digit to drop 
off the right end of the word. If we have the choice of losing a digit at the 
right or left ends, clearly we must choose to drop the low order digit 
(otherwise the result would be nonsense). There is a choice of simply dropping 
the lost digit called truncation, or adding I/2 to the highest order digit 
about to be dropped called rounding, and generally ro-unding is preferable. The 
errors introduced by these processes are called truncation error emd round 
off error, respectively. 

While we have discussed the error introduced by post addition 
normalization, the preaddition alignment may also introduce error. For example, 
let 

nf£^(2,5,5) = 11100 X 2^^ 
nf£2(2,5,5) = 10111 x 2°°-^ 



Then 
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11100 X 2^^^ = nf 
+ 0010111 X 2^-'--^ =r nf f 2 
10000111 X = nf i. 

The fraction imderflow digits underlined at the right are. somewhat in doubt, 
"being the sum of the low order digits given for nf ig ajid assumed low order 

zeros for nf f^. The underlined digit at the left is the- fraction overflow 

discussed above. To finish the addition we must shift right to renormalize 
the fraction and subtract one from the exponent to adjust it. Finally we 
shall round by adding an appropriate 1. Thus we obtain 

1000011 X 2^^^ 

1000011 X 2"^^^ shift and adjust 

+ 1 .round 

10001 X 2"^^ result 

In decimal notation 

nf£^ = 1, nfig = l.kyj^, and nf£^ = 8.5 

and our machine addition process has introduced a round-off error of + .0625* 

Note that if we had truncated instead of rounding, nf = 8 and the truncation error 

wo\ild be - *hyi3. 

Generally error may he expressed in absolute or relative terms. Both 
of our above examples were absolute error, e^, the actual value of the error. 
Perhaps of more interest is the relative error , e^, expressed as the ratio of 
absolute error to the correct value (or approximately to the computed value) . 
Thus for the above example, the relative error due to round off is 



^a .0625 

rr nfi^ + nfi^ " 8.4375 



= .007^4- 



wiiile the relative error due to truncation is 



'rt - nf£^ + nffg ~ 6.4575 " 
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2.5.^ Floating Point Multiplication 

We shall briefly consider floating point multiplication, emphasizing 
the steps before and after the actual multiplication. Assuming normalized 
numbers, let 

e-, e 
nfi^ X nfig = .(fj^ X r ) x (fg X r ) 

e^+Cg 

= f ^ X f 2 X r ^ = nfi^. 

Normalized fractions may be multiplied with no possibility of overflow. At the 
same time the exponents may be added and this process overflows if le^+e^l > e. 

In this case the user should be notified that he has. exceeded the machine's 
capacity. The product of two f digit fractions will generally be of length 2f 
and this means that extra .register length should be provided in the arithmetic 
unit. The user may want to save both the high and low order bits of the product. 
More likely he will simply want to save a rounded single length result. Note 
that the rounding process must be followed by a renormalization and exponent 
adjustment step. 

The time consuming process of multiplying the fractions may be 
done in various ways. Typically some kind of repeated addition loop is 
executed. Thus one operand is shifted and added to itself under the control 
of the other operand. 



2.6 Bit Level Design Options 

To this point we have discussed a number of elementary ideas 
ahout computer n\iiribers and arithmetic. With this as hackgro\md we shall 
turn our attention to some overall questions about computer arithmetic. 
To make the discussion tractable we shall limit ourselves mainly to 
floating point addition. As we mentioned earlier the design of a 
machine must be regarded* from the user*s point of view as well as the 
designer's. The designer wants an "inexpensive" unit. Users have a 
great variety of perfonnance desires. We shall discuss perfonnance and 
cost in terms of a number of parameters. Our objective is to give the 
reader some ideas of an overall nature rather than to discuss specific 
designs. Thus we shall present analyses of: shifting as a function of 
radix; precision and accuracy as functions of radix; roundoff as a function 
of word length and nimiber of arguments; overall speed as a function of 
number representation, hardware characteristics, and word lengths. 

2.6.1 Optimal Choice of r^ 

We shall consider several aspects of the choice of radix. First 
is the optimization, of r^ in machine structure terms. 

Assume that a variety of physical devices are available at 
various costs. Their speeds may be assumed to be equal or cost may be 
written as a function of speed. In any case, assume that an entire machine 
is made of various "black boxes" which are radix r devices. Also assume 
that the cost per bit of such devices can be expressed as (here r = r) 
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We must qualify this assumption because memory and processor costs per bit 
are usually quite different due to different technologies. This could be 
reflected in a. Furthermore, it is probably true that if we simulated radix 
r "black boxes" using radix 2 hardware, the cost per bit of memory and processor 
would have quite different p values. If the assumption is valid for some part 
of a machine-, we proceed as follows. Let some number N = r^ be represented 
by n bits in various radices. Then 
log^N = n log^r 

and the cost of N (storage, processing, etc.) may be expressed as 

= a r'^n 

glog K 

= d r*^.r • 

log^r 

To minimize this cost with respect to r 



Thus 



o P-1. • P-1 
dc log r-r"^ 

^ = a log W ^ = 0. 

dr ^e /, >2 

(log r) 



P log^r = 1 



1/P - 
or r = e ' . 

Since the second derivative is positive we have an expression 
for a minimum cost radix. For exairrple, P = 1 implies that the bit cost 
is proportional to the radix value in a linear way. In this case r = e 
and binary or ternary arithmetic are nearly optimal. In fact a binary 



radix is optimal in case p = ^^'^ ^ = Lk-k* 
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2.6.2 Choice of r, 
b 

Next we consider the choice of radix in deteimning the rajige 
of floating point numbers. Given some fixed hardware representation for 
mimbers, the e digit exponent and radix r^ allow f digit fractions to he 

scaled up and down, r^ is built into the logic of the arithmetic unit, 

while e and f determine the machine's word length.. The choice of r^ 

affects the precision and range of normalized floating point numbers. 

The following table contains some illustrative examples (assuming r = 2) . 



Number 


^b = 


= 2 


^b = 


16 




f 


e 


f 


e 


1/16 


.1 


-on 


.0001 


000 


1/8 


.1 


-010 


.0010 


000 


lA 


.1 


-001 


.0100 


000 


1/2 


.1 


000 ' 


.1000 


000 


1 


.1 


001 


.0001 


001 


2 


.1 


010 


.0010 


001 


k 


.1 


oil 


.0100 


001 



It is iiumediately clear that fractional parts of hexadecimal numbers may 
have leading ::eros and still be normalized. On the other hand, when a 
shift is necessaiy, four binary digits are lost per hexadecimal digit. 
Thus, we incur a larger loss of precision per shift with hexadecimal. 
It should also be clear that fewer different values of exponent are required 
for the same range using larger r^. Thus another question is, what r^ is 

most efficient of total word length use? Another obvious question given 
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N digits is, what sizes of e and f should he used? We shall deal with 
each of these matters helow. 



2.6.2.1 Addition Shift Distances in Practice 

First we consider the loss of precision due to shifting using 
higher r^ values. Specifically we shall discuss hexadecimal and binary. 

D. W. Sweeney [ ] has analyzed floating point addition in a nuinber of 
scientific codes. By tracing about 10 million -instruction executions, he 
observed that an overall average of about 10^ of the instructions executed 
were floating point additions. We shall reproduce only a few of his 
findings. In particular we are interested in preaddition alignment shifts 
and post addition normalization shifts. The values in the table represent 
the number of shifts of a particular distance expressed as a percentage of 
all cases measured. • The numbers added were not necessarily of like sign 
and a few unnormalized operations were included. 



Shift 
Distance 



Shift 
Distance 



r, = l6 
b 



alignment 



0 



32.6k 



26.02 



overflow 
normalization 0 

1-i^ 



19.65 
59.38 
lii-.51 



overflow 
0 
1 



5.5 
82.35 
7.2k 



As expected, we observe more zero shift cases with higher r^. 

In fact, normalization shifts for hexadecimal numbers only occiir about l8^ 
of the time. Comparing the sum of the alignment shifts from 0 to 4 for binary 
and from 0 to 1 for hexadecimal, slightly favors hexadecimal. Of course the 
binary shifts occur in increments of one bit of precision loss. Similar 
'sums for normalization are almost equal." 
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Viewed another way, we can 0I3 serve that the sum of alignment 
percentage for distance 0 and 1 with base 2 are slightly less than the 
percentage of distance 0 shifts for hase I6. Similarly the distance 0 
and 1 normalization shift percentages in binaiy are slightly more than the 
distance 0 shifts for base 16. 



2.6.2.2 Distribution and Number of Values as a function of r, 

b 

We first study the nimber of different values representable 
using various bases and the distribution of these values. Notice that 
when floating point numbers with r^ = 2 are required to be in normalized 

form, only half of the possible values representable with f bits are 
used (just those with a leading l). When one leading zero is allowed 
(r^ = k) then 50^ more values are representable and so on. 

Given e and f bits of exponent and fraction, respectively, there 
are 2^ different exponents and 2^""^ different normalized fractions 
representable. (We are assuming here that r^ = r^ = 2.) Thus the total 

e+f-1 

number of representable values is 2 . Since the largest fraction 
representable is approximately 1, the largest binary number representable 

2^ 

is approximately 2 

^1 

Now if r^ = p = 2-^, numbers have the form f 3_ X P • To estimate 
the number of values less than the maximum binary niomber (r^ = 2) we observe 
that for some k (assuming f^ « l) 




k 

Thus kliDg^P » 2 . Now the number of values less than p is approximately 

(af-i + 2^-2 ^. . . . + 2^-^°eaP)(k+i). 

Thus we can write 

2^ 

numher of r = p values less than 2 

' = representation ratio 

2® 

number of r, = 2 values less than 2 
h 

^ n n -(logpP-1) 

_ 2 (1 + 2 + ... + 2 )(k+l) 

T n p -(log P-1) 

_ 2-^"' (1 + 2" + 2 + ... + 2 )(k+l) _ 

2^''^(log2P)k 



-, p -(log P-1) 

,k+lx . 1 + 2" + 2 + . . . + g s 
^ k log^fi ^' 



?^ 6 

If P = l6 and e = 8, then we have k = - ,/ = 2 

log^lb 

and representation ratio ~ jj; * .47 

By a similar analysis it may he shown that there are ahout 1.88 times as 

many hexadecimal values, as binary values representable using fixed e and 

f . Thus we conclude that about half of the hexadecimal values are in the 

5 

range of the binary values and about — are outside the binary range. 



2.6.2.5 r^, f, and accuracy 



The accuracy with which some form of floating point numbers 
represents the real numbers may be studied by examining the intervals 



between the floating point numbers. Thus, if fi^ and ^^^^-j^ denote a pair 
oi' adjacent representable floating point nuiribers, then 

1+1 2 



1 

is a relative interval measure of accuracy. It turns out that this has 

r"^"^ as its maximum value. In [9], for this as well as other accuracy 
measures, the question of floating point number representations is studied. 
It is shown that for fixed N = e + f , the choice of r^ = r^ always provides 

k 

as much accuracy and more exponent range than some r^ = r^ . In other 

k 

words, while f may be made larger at the expense of e with r^ = r^ , the 

tradeoff with accuracy is not a good one. 

That this is true is not hard to see by studying the exponent 
range, E, as a function of f and i, where i = log^ p, f £ = fp , and 

V 

N = e + f . Thus we have 

E(f,i) = i(r^"^-l). 
Assuming that f > i> for the same accuracy we study the ratio of exponent 

ranges of an r^ = r^ number to an r^ = r^ nmber: 

E(f,i) ; " i Ir'^-^-l 



2.6.5 Rounding and Truncation 

Assume we have a computed floating point number with f digits of 

fraction to be retained plus some low order digits which must be disposed of. 

-f 

The lowest order digit of the f digits represents r , so the digits to the 
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right represent a quantity whose value is less than r . Regardless of 

the process used to dispose of these low order digits, the error on each 

-f 

step is less than r . Intuitively it seems desirable to minimize this 
error on each step. If it is necessary to introduce a positive error on 
some steps and a negative error on other steps, it would also seem 
intuitively desirable to try to minimize the algebraic sum of these errors; 
that is to minimize the bias in disposing of the extra digits. 

First we consider the error due to simply dropping and forgetting 
the extra digits; this is usually called truncation ■ error . With an f digit 
fraction, the truncation error is 

0 < < r"^. 

'Che bias introduced by truncation is the sum of these errors over many steps. 
If the average error is one half of the maximum then the bias over n additions 
is 



-f 




An intuitively better procedure is rounding the f bits to be saved 
using the high order bit of those to be disposed of. The error so introduced 
is usually called round off error. In this case the error is 

0 < e < i r"^ 
— r — 2 

or at most one half that of This may be seen by considering a floating 

point number 



■ ^1 


a 




< f — ^ 




• 




<-l-^ 
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■where Hijf^ || « f. To round ve add — to it in the a position. If a"-^ = 0 

then clearly no carry is generated and hy dropping a ^, no error is introduced. 

If a < 2* and p is as large as possible, then no carry is propagated and p is 

1 -f r 

dropped, introducing an error of at most — r . If a > ^ • then a one carries 

-f 

to f^ thus adding r to f^. In this case the smallest that a p can be is 
r -Cf+l) 

^ . Y \ t ^ ijij^^g introduce a maximum error equal to the amount added to f^ 

minus the least amount lost by dropping a i«e. 

^ -f r -(f+1) 
^r - ^ - 2 ' ^ 



< r 



^ -f -f 
•f r r 



2 2 

In the case of r = 2, the error introduced by this process is -p if a = 0 and 



2"^ - %^ - P if a =• 1. If P = 0 then the bias 



.-f 



is 0 + C- - 0 = 2-(^^l) 



If 



p ^ 0, then for each p. we can find a p. such that p. - (2'^-^"''-^) - p.) = o. 

1 J 1 t) 

Thus, if we assume that all values of P are equally likely, the total bias is 
dust 2-(^+l). 

Let us consider the possibility of reducing the round off bias to zero. 
Consider the following table 



Number Presented 



X 
X 
X 
X 



-(f+1) ^-(f+2) 



Rounded Result 

X 
X 

. X + 1 
X + 1 



2 X error 




bias = — 
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Here the second and fourth nuniberG may he paired to Introduce zero 
bias (cf. discussion ahove) and here (as in general) the third case (lO) 
introduces the bias. If it were possible to detect the case p = 0, a = 1 and 
round this in only half of the cases a zero bias rounding procedure would 
exist. For example, some random bit could be used to take the choice in the 

a = 1, P = 0 case. 

A scheme ■which is easier to implement than rounding and not much 
more difficult to iniplement than truncation is the jamming of a 1 into the 
last bit position. The error and bias of this are between those of roimding 
and tnmcation. 
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2.7 Addition Speed vs. Gate Count vs. Humber Representatives 

A simple way of adding two niiEibers is to add two digits at a 
time, generate sum and carry digits and go on to the next pair of digits. 
Such schemes are generally referred to as serial by digit and were often 
used in early machines [10]. To speed up the addition process one 
naturally considers • adding several digits at once. ' In general. tliis 
leads to questions about propagating carry digits. Using the residue 
or signed digit representation) carry propagation is not a problem as 
we saw ear3-ier, but these are both "unusual" number systems and we shall 
deal with them later. Another question that comes up is the possibilit;/ 
of adding n numbers together at once and considering the speed and cost 
of this process compared with adding two numbers. 

By the early 1960*s a nimiber of fast parallel addition algorithms 
were in common uce. A number of alternatives for binary addition are 
compared in [9] by Sklansky and summarized in Figures 11, 12, ih, 13, 
and 16 cuid Table I there. Sklansky shows an n bit serial adder with 7 
gates and kn gate delay time steps. He also has a full ripple carry adder 
with 7n gates and 2(n+l) time steps. Several look ahead carry units are 
described including a full look ahead conditional sum adder with 
5n(2+f loggCn+l)! ) gates and 2(l-H"log2(n+l)l ) time steps, it is assumed that 

all gates have a fan in of 2. Sklansky also proposes and contrasts three crite 
for performance. 

At about the same time, Mac Sorley [ 5] surveyed various 
binary arithmetic algorithms. 1\^iile he does not give functions describing 
their speed and gate count, his Table II contains nimibers which compare 
several sclimes for n - 50 and n = 100. From this one can infer that his 
full ripple algoritlim requires 8n spates and 2n time units while his full 
look aliead a]^;ori rhjii requires 2|" lo£:.^n1 time units and less than 2nriog^nl 
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gates. MacSorley also discusses a number of multiplication, and division 

ideas. Another paper appeared in late I961 [ik] which also compares a number of 

addition schemes and proposes that several are "better" than Sklansky's 

earlier conditional sum technique. Table II of this paper compares 

several schemes. In this paper as well as [5], the nation of "gate" 

seems to be less well defined than in [9]« In [I6] Lehman again compares 

a number of schemes. 

In any case, [ik] led to an exchange of correspondence in April, 
1963^ beginning with [15] . A number of assumptions are discussed at some 
length in this correspondence. Sklansky discusses some bounds on add 
time independently of any particular circuits but which do include fan in 
and fan out considerations. 

We can roughly summarize the state of the art for binary addition 
in the early 1960s as follows 

Adder Type Gate Count Time Units 

Bit Serial . 7 , kn 

Full Ripple 8n 2n 

Full Look- Ahead 2nriog2nl 2riog2nl 

This leads to the obvious question: Can one demonstrate an addition 
circuit faster than 2riog2n1 steps at any cost? One should also be prepared 
to consider unusual number systems at this point. 

Winograd [11] studied the time required to perform 
addition under a rather general set of assumptions. We shall particularize 
things somewhat in the present discussion. Roughly speaking, Winograd 's 
definitions are wide enough to include most known number systems and addition 
algorithms, except signed digit addition. One must be concerned about the 
encoding, adding, and decoding of nimabers to ensure that the addition is 
"really performed" by the addition algorithm and not by the encoding and 
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decoding process. In any case*, his Theorem 1 is proved for gates with unit 
delay and a fan in of f . 

Theorem 1 The time T to add two n digit numbers is 

' T > riog^2nl 
From this he obtains for k arguments: 

Corollary 2 The time T to add k, n digit numbers is 
T > riog^nl 

Winograd also constructs a multiplication scheme which approaches 
this bound as shown in his Theorem 2. However, the technique uses residue 
numbers and so overflow is not detected in the time given. In [12] 
Winograd discusses (the time required for multiplication as well as) the 
time required to detect an overflow in the addition of two residue numbers. 
In Theorem 9 he shows that the overflow detection time is T > flog^n]. 
Winograd summarizes his results in a simple way in [15]- 

Comparing these results vdth the full lookahead scheme mentioned 
earlier it is clear that Winograd 's lower boirnd requires about half the 
time of a full look-ahead adder. But overflow detection requires the 
same time so nothing is saved. The question remains, however, can Winograd *s 
bound be approached by some scheme with overflow detection? 

Brent [k] considers this problem and establishes that a kind 
of carry look-ahead adder can he constructed which for large n approaches 
Winograd 's bound. Furthermore, his Theorem 1 outlines a scheme for constructing 
the adder with order of n log^n gates, although he does not eadiibit the scheme. 
This is favorable improvement on the full look-ahead numbers we tabulated 
earlier. 
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We remarked earlier- that Winograd's fomulation of this speed, cost 
problem did not include the signed digit numtjer system. It was shown "by 
Avizienis in [2] and discussed in more generality in [1] that addition 
could "be performed in a fixed amount of time independently of n, the number 
of digits. [1] also discusses the number of gates required for various 
schemes, but the redundancy required complicates direct comparison with the 
binary cases discussed earlier. 

Avizienis also discusses the addition of k numbers and derives a 

time T = flog^ T^ll +1, f > ^> as well as some gate count functions. 
2 

When signed digit arithmetic is performed, it is assimed that all 
numbers are encoded before the calculation begins. Then signed digit 
arithmetic is performed. Finally the numbers are decoded, a process which 
propagates the last carries. 
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2,Q Multioperation Speedup s 

We have seen that arithmetic operation speeds may be reduced to 
a function of the speed of parts from which the arithmetic units are built. 
We have also seen that lower bounds may be established on the times required 
to perform arithmetic in various number systems. Do these observations 
imply that the speedup of computers has been reduced to waiting for 
faster parts from which new machines may be built? Obviously not, since 
we may consider operating on many pairs of numbers simultaneously. Recall 
that Babbage planned to do arithmetic and indexing at once and that 
Menabrea suggested performing more than one arithmetic operation at once. 

If we restrict ourselves to the addition and multiplication 
operations, we can now regard an arithmetic processor as a collection or 
combination of multiplier and adder units. Suppose we have an arithmetic 
processor containing tvjo adders and a multiplier and wish to evaluate 
(a+b)-^(c+d) . Then the two sums can be formed simultaneously. Thus the 
arithmetic processor \^ould appear to be able to add twice as fast as each 
adder can in fact add. In the CDC 660O this idea is implemented, cf . Ch. V 
of [8]. 

It is also possible to achieve faster arithmetic by what is 
called pipeline processing. If some operation requires T time units, then 

by cutting the logic into K stages and connecting them through registers, it 

T 

is possible to introduce a new pair of operands every =r time units. 

j\ 

Similarly, results emerge from such a pipeline at the rate of one result per 
T 

— time units. This idea is used in the 360/195 as well as the CDC STAR and 

TI ACS machines. 

Another approach to speedup by machine organization is to • 
sequence many simple (one of each operation at a time) arithmetic processors 



from one program. If we have to add k pairs of numbers and we add them 

all at once then it takes T time units, but the effective speed per 

T 

addition is ^ time units. This speedup is analogous to that in the pipe- 
line case, but in practice k may be made much larger for parallel than 
for pipeline machines. This is the approach being taken in the construction 
if Illiac IV [3] . 

Just as we studied the maximum speed of addition (and multipli- 
cation) for single arithmetic- units, more complex function's speeds can 
be studied for multi arithmetic function processors e.g.- the cases discussed 
above. As a model of the most general case of these, let us consider an 
unlimited number of adders and multipliers which operate simultaneously. 
Each operation (add or multiply) takes one time imit and the processors 
can communicate their outputs to any other processor in zero time. We 
also ignore memory times. 

How fast can such a machine multiply two matrices or evaluate 
a polynomial? Two N X N matrices may be multiplied in 1 + Tlog^Nl steps, 

instead of the usual 2Tp, by the following scheme. We must form inner 

3 

products, each of dimension N. Consider N multipliers each of which 
perfoims one multiplication on the first step. On the second step we 
start to form the s\mis for the inner products. After one addition using 

adders, we have ^ results. On the second addition step we use half of 

these adders to obtain ^ results. After Flog^Nl such steps we have 

results, namely, the elemerits of the product matrix. 

It has been shown by Pan [7] that 2n operations are required 
to evail.uate a polynomial of degree n. Thus, for a serial machine, Horner's 
Rule is optimal. "However, it is easy to see that the form 
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Pj^(x) = + a^x + a^x . . . + a^x^ 

requires only floggiil steps to evaluate all powers, one step to multiply 

"by the coefficients, and 1 + Flog^nl steps to sum the terns. Thus, by 

introducing some "redundant" operations we can obtain the result in 
2(l+riog2nl) time steps. This is a crude upper bound for a multiarithmetic 

unit machine because some additions can be performed before the final 
multiplications are performed. A lower bound for a multiarithmetic unit 
machine is 1 + floggUl, following Pan, but it is not obvious how to achieve 

this. In [6] Muraoka shows how to approach it. Improvements of Muraoka's 
result may be found in [I7] and [18] . 
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